10 Jun 2025
I owe a debt of gratitude to many people as the thoughts and code in these slides are the process of years-long development cycles and discussions with my team, friends, colleagues and peers. When someone has contributed to the content of the slides, I have credited their authorship.
These materials are generated by Gerko Vink, who holds the copyright. The intellectual property belongs to Utrecht University. Images are either directly linked, or generated with StableDiffusion or DALL-E. That said, there is no information in this presentation that exceeds legal use of copyright materials in academic settings, or that should not be part of the public domain.
Warning
You may use any and all content in this presentation - including my name - and submit it as input to generative AI tools, with the following exception:
Materials
Let’s start with the core:
Statistical inference
Statistical inference is the process of drawing conclusions from truths
Truths are boring, but they are convenient.
\(^1\) See Jelke Bethlehem’s CBS discussion paper for an overview of the history of survey sampling
Without any data we can still come up with a statistically valid answer.
Some sources of information can already tremendously guide the precision of our answer.
In Short
Information bridges the answer to the truth. Too little information may lead you to a false truth.
Good questions to ask yourself
Hmmm…
Would that mean that if we simply observe every potential unit, we would be unbiased about the truth?
The problem is a bit larger
We have three entities at play, here:
The more features we use, the more we capture about the outcome for the cases in the data
The more cases we have, the more we approach the true information
All these things are related to uncertainty. Our model can still yield biased results when fitted to \(\infty\) features. Our inference can still be wrong when obtained on \(\infty\) cases.
The problem is a bit larger
We have three entities at play, here:
The more features we use, the more we capture about the outcome for the cases in the data
The more cases we have, the more we approach the true information
Core assumption: all observations are bonafide
When we do not have all information …
In some cases we estimate that we are only a bit wrong. In other cases we estimate that we could be very wrong. This is the purpose of testing.
The uncertainty measures about our estimates can be used to create intervals
An intuitive approach to evaluating an answer is confidence. In statistics, we often use confidence intervals. Discussing confidence can be hugely informative!
If we sample 100 samples from a population, then a 95% CI will cover the true population value at least 95 out of 100 times.
Neyman, J. (1934). On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection.
Journal of the Royal Statistical Society Series A: Statistics in Society, 97(4), 558-606.
We can replicate our sample.
Full sampling validation of a model’s inferences is a lot of work.
Under some general assumptions, we can use the same data to validate our model’s inferences and predictions.
Take the following definition:
Assumptions are a statisticians faith. It is often impossible to prove that they hold in practice, but we choose to believe that they do.
Sensitivity analyses
I often use computational evaluation techniques to quantify the scope of the impact of assumptions made. For example, we can test the effect of violating assumptions on our results. We then verify if the inferences are sensitive to violations of the assumptions. We can even verify the extend of when assumptions start becoming influential to our inferences.
Whenever I evaluate something, I tend to look at three things:
As a function of model complexity in specific modeling efforts, these components play a role in the bias/variance tradeoff
Individual intervals can also be hugely informative!
Individual intervals are generally wider than confidence intervals
Be careful
Narrower intervals mean less uncertainty.
It does not mean that the answer is correct!
36 years ago, on 28 January 1986, 73 seconds into its flight and at an altitude of 9 miles, the space shuttle Challenger experienced an enormous fireball caused by one of its two booster rockets and broke up. The crew compartment continued its trajectory, reaching an altitude of 12 miles, before falling into the Atlantic. All seven crew members, consisting of five astronauts and two payload specialists, were killed.
In the decision to proceed with the launch, there was a presence of dark data. And no-one noticed!
This missing information has the potential to mislead people. The notion that we can be misled is essential because it also implies that artificial intelligence can be misled!
If you don’t have all the information, there is always the possibility of drawing an incorrect conclusion or making a wrong decision.
We now have a new problem:
What would be a simple solution to allowing for valid inferences on the incomplete sample? Would that solution work in practice?
There are two sources of uncertainty that we need to cover when analyzing incomplete data:
A straightforward and intuitive solution for analyzing incomplete data in such scenarios is multiple imputation (Rubin, 1987).
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. John Wiley & Sons.
mice
iter imp variable
1 1 hgt wgt bmi hc gen phb tv reg
1 2 hgt wgt bmi hc gen phb tv reg
1 3 hgt wgt bmi hc gen phb tv reg
1 4 hgt wgt bmi hc gen phb tv reg
1 5 hgt wgt bmi hc gen phb tv reg
2 1 hgt wgt bmi hc gen phb tv reg
2 2 hgt wgt bmi hc gen phb tv reg
2 3 hgt wgt bmi hc gen phb tv reg
2 4 hgt wgt bmi hc gen phb tv reg
2 5 hgt wgt bmi hc gen phb tv reg
3 1 hgt wgt bmi hc gen phb tv reg
3 2 hgt wgt bmi hc gen phb tv reg
3 3 hgt wgt bmi hc gen phb tv reg
3 4 hgt wgt bmi hc gen phb tv reg
3 5 hgt wgt bmi hc gen phb tv reg
4 1 hgt wgt bmi hc gen phb tv reg
4 2 hgt wgt bmi hc gen phb tv reg
4 3 hgt wgt bmi hc gen phb tv reg
4 4 hgt wgt bmi hc gen phb tv reg
4 5 hgt wgt bmi hc gen phb tv reg
5 1 hgt wgt bmi hc gen phb tv reg
5 2 hgt wgt bmi hc gen phb tv reg
5 3 hgt wgt bmi hc gen phb tv reg
5 4 hgt wgt bmi hc gen phb tv reg
5 5 hgt wgt bmi hc gen phb tv reg
Gerko Vink @ Anton de Kom Universiteit, Paramaribo